Picture for Maoyuan Ye

Maoyuan Ye

VTAgent: Agentic Keyframe Anchoring for Evidence-Aware Video TextVQA

Add code
May 06, 2026
Viaarxiv icon

ET-SAM: Efficient Point Prompt Prediction in SAM for Unified Scene Text Detection and Layout Analysis

Add code
Mar 26, 2026
Viaarxiv icon

Adapting Segment Anything Model for Power Transmission Corridor Hazard Segmentation

Add code
May 28, 2025
Viaarxiv icon

GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking

Add code
May 28, 2025
Figure 1 for GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking
Figure 2 for GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking
Figure 3 for GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking
Figure 4 for GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking
Viaarxiv icon

Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues?

Add code
May 19, 2025
Viaarxiv icon

LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?

Add code
May 18, 2025
Viaarxiv icon

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

Add code
Jan 31, 2024
Figure 1 for Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
Figure 2 for Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
Figure 3 for Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
Figure 4 for Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
Viaarxiv icon

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching

Add code
Jan 13, 2024
Viaarxiv icon

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Text Spotting

Add code
May 31, 2023
Viaarxiv icon

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

Add code
Nov 23, 2022
Figure 1 for DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
Figure 2 for DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
Figure 3 for DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
Figure 4 for DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
Viaarxiv icon